Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Learning Top- k Transformation Rules

Identifieur interne : 000464 ( Main/Exploration ); précédent : 000463; suivant : 000465

Learning Top- k Transformation Rules

Auteurs : Sunanda Patro [Australie] ; Wei Wang [Australie]

Source :

RBID : ISTEX:0E53B146AE762B16D3A5D89E42E870FCD55FC2D6

Abstract

Abstract: Record linkage identifies multiple records referring to the same entity even if they are not bit-wise identical. It is thus an essential technology for data integration and data cleansing. Existing record linkage approaches are mainly relying on similarity functions based on the surface forms of the records, and hence are not able to identify complex coreference records. This seriously limits the effectiveness of existing approaches. In this work, we propose an automatic method to extract top-k high quality transformation rules given a set of possibly coreferent record pairs. We propose an effective algorithm that performs careful local analyses for each record pair and generates candidate rules; the algorithm finally chooses top-k rules based on a scoring function. We have conducted extensive experiments on real datasets, and our proposed algorithm has substantial advantage over the previous algorithm in both effectiveness and efficiency.

Url:
DOI: 10.1007/978-3-642-23088-2_12


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Learning Top- k Transformation Rules</title>
<author>
<name sortKey="Patro, Sunanda" sort="Patro, Sunanda" uniqKey="Patro S" first="Sunanda" last="Patro">Sunanda Patro</name>
</author>
<author>
<name sortKey="Wang, Wei" sort="Wang, Wei" uniqKey="Wang W" first="Wei" last="Wang">Wei Wang</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:0E53B146AE762B16D3A5D89E42E870FCD55FC2D6</idno>
<date when="2011" year="2011">2011</date>
<idno type="doi">10.1007/978-3-642-23088-2_12</idno>
<idno type="url">https://api.istex.fr/document/0E53B146AE762B16D3A5D89E42E870FCD55FC2D6/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001184</idno>
<idno type="wicri:Area/Istex/Curation">001114</idno>
<idno type="wicri:Area/Istex/Checkpoint">000121</idno>
<idno type="wicri:doubleKey">0302-9743:2011:Patro S:learning:top:k</idno>
<idno type="wicri:Area/Main/Merge">000470</idno>
<idno type="wicri:Area/Main/Curation">000464</idno>
<idno type="wicri:Area/Main/Exploration">000464</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Learning Top- k Transformation Rules</title>
<author>
<name sortKey="Patro, Sunanda" sort="Patro, Sunanda" uniqKey="Patro S" first="Sunanda" last="Patro">Sunanda Patro</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Australie</country>
<wicri:regionArea>University of New South Wales</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Australie</country>
</affiliation>
</author>
<author>
<name sortKey="Wang, Wei" sort="Wang, Wei" uniqKey="Wang W" first="Wei" last="Wang">Wei Wang</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Australie</country>
<wicri:regionArea>University of New South Wales</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Australie</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2011</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">0E53B146AE762B16D3A5D89E42E870FCD55FC2D6</idno>
<idno type="DOI">10.1007/978-3-642-23088-2_12</idno>
<idno type="ChapterID">12</idno>
<idno type="ChapterID">Chap12</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: Record linkage identifies multiple records referring to the same entity even if they are not bit-wise identical. It is thus an essential technology for data integration and data cleansing. Existing record linkage approaches are mainly relying on similarity functions based on the surface forms of the records, and hence are not able to identify complex coreference records. This seriously limits the effectiveness of existing approaches. In this work, we propose an automatic method to extract top-k high quality transformation rules given a set of possibly coreferent record pairs. We propose an effective algorithm that performs careful local analyses for each record pair and generates candidate rules; the algorithm finally chooses top-k rules based on a scoring function. We have conducted extensive experiments on real datasets, and our proposed algorithm has substantial advantage over the previous algorithm in both effectiveness and efficiency.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Australie</li>
</country>
</list>
<tree>
<country name="Australie">
<noRegion>
<name sortKey="Patro, Sunanda" sort="Patro, Sunanda" uniqKey="Patro S" first="Sunanda" last="Patro">Sunanda Patro</name>
</noRegion>
<name sortKey="Patro, Sunanda" sort="Patro, Sunanda" uniqKey="Patro S" first="Sunanda" last="Patro">Sunanda Patro</name>
<name sortKey="Wang, Wei" sort="Wang, Wei" uniqKey="Wang W" first="Wei" last="Wang">Wei Wang</name>
<name sortKey="Wang, Wei" sort="Wang, Wei" uniqKey="Wang W" first="Wei" last="Wang">Wei Wang</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000464 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000464 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:0E53B146AE762B16D3A5D89E42E870FCD55FC2D6
   |texte=   Learning Top- k Transformation Rules
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024